Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
20th IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2022 ; : 17-22, 2022.
Article in English | Scopus | ID: covidwho-2319669

ABSTRACT

After the COVID-induced lock-downs, augmented/virtual reality turned from leisure to desired reality. Real-time 3D audio is a crucial enabler for these technologies. Nevertheless, systems offering object spatialization in 3D audio fall in two limited cases. They either require long-running pre-renders or involve powerful computing platforms. Furthermore, they mainly focus on active audio sources, while humans rely on the sound's interactions with passive obstructions to sense their environment. We propose a hardware co-processor for real-time 3D audio spatialization supporting passive obstructions. Our solution attains similar latency w.r.t. workstations while draining a tenth of the power, making it suitable for embedded applications. © 2022 IEEE.

2.
50th ACM SIGUCCS User Services Annual Conference, SIGUCCS 2023 ; : 42-47, 2023.
Article in English | Scopus | ID: covidwho-2300153

ABSTRACT

With the outbreak of COVID-19, the lecture environment at universities has increasingly turned into online environments. In addition to those delivered entirely by online tools, there are hybrid, online and in-person, lecture environments. Hybrid casting environments are growing not only in the classroom, but also in various conferences. In an online-only meeting environment, web conferencing tools such as Zoom and Webex can be used to approximately achieve the objective. In a hybrid environment, however, a face-to-face environment is also necessary, and it is essential to build an environment that is aware of both online and face-to-face interaction. It would be fine if the venue already has the equipment to serve the purpose, but in some cases, there are no facilities and the equipment must be carried in and arranged. At this point, the most difficult point is in the audio system configuration. This requires a certain level of technical knowledge and monetary costs. For a reasonable price, it is possible to outsource to a specialized service provider to create a perfect casting environment. However, in many situations, it is difficult to take significant costs and many people are trying to manage the situations by trial and error. We have experienced various hybrid casting situations. Recently, we try to consider how to reduce costs from various aspects, such as "avoiding high costs in terms of manpower, equipment, and expenses,""not requiring operators to have lots of knowledge,"and "minimizing the amount of equipment to be carried in, as it is integrated with the existing equipment at the venue. In this presentation, we provide actual examples of hybrid casting environments in which the author experienced, mainly by bringing in, setting up, and operating equipment by one person, and outlines the key points of these operations, as well as considering what kind of casting environment can reduce various costs and achieve hybrid casting more easily. We would like to share with the SIGUCCS community what kind of total peripheral environment is needed to make hybrid delivery more familiar, not just the delivery technology itself, such as Zoom or Webex, and to think together about how it should be. © 2023 Owner/Author.

3.
Data and Knowledge Engineering ; 144, 2023.
Article in English | Scopus | ID: covidwho-2246068

ABSTRACT

Speaker diarization is the partitioning of an audio source stream into homogeneous segments according to the speaker's identity. It can improve the readability of an automatic speech transcription by segmenting the audio stream into speaker turns and identifying the speaker's true identity when used in combination with speaker recognition systems. Generally, the automatic speaker diarization is done based on two phases, like the transformation of audio segments into feature representation and the clustering. In this paper, clustering along with a hybrid optimization technique is carried out for performing the speaker diarization. For that, the extracted features from the audio signal is processed under speech activity prediction in order to identify the speak segments. The diarization process is done by Deep Embedded Clustering (DEC) in which the constants are trained by the developed Fractional Anticorona Whale Optimization Algorithm (FrACWOA). The FrACWOA is a hybrid optimization technique, which is designed by adapting the concept of fractional theory, precaution behaviour of COVID-19 and hunting performance of whales. DEC performs the diarization, which concurrently learns the representation of features as well as cluster assignments with neural networks. Using a mapping from the information space to a lower-dimensional feature space, DEC repeatedly discovers the most effective solution for a clustering objective. On the basis of testing accuracy, diarization error, false discovery rate (FDR), false negative rate (FNR), and false positive rate (FPR) of 0.902, 0.627, 0.276, 0.117, and 0.118, respectively, the developed FrACWOA+DEC algorithm performed much better with six speakers using the EenaduPrathidwani dataset. Comparing the accuracy of the proposed method to existing approaches such as Active learning, DE+K-means, LSTM, MCGAN, ANN-ABC-LA, and ACWOA+DFC, the accuracy of the proposed method is 12.97%, 10.31%, 9.75%, 7.53%, 4.32%, and 2.106% higher when using 6 speakers. © 2022 Elsevier B.V.

4.
AES: Journal of the Audio Engineering Society ; 70(11):926-937, 2022.
Article in English | Scopus | ID: covidwho-2204428

ABSTRACT

Nowadays, widely used videoconferencing software has been diffused even further by the social distancing measures adopted during the SARS-CoV-2 pandemic. However, none of the Web-based solutions currently available support high-fidelity stereo audio streaming, which is a fundamental prerequisite for networked music applications. This is mainly because of the fact that the WebRTC RTCPeerConnection standard or Web-based audio streaming do not handle uncompressed audio formats. To overcome that limitation, an implementation of 16-bit pulse code modulation (PCM) stereo audio transmission on top of the WebRTC RTCDataChannel, leveraging Web Audio and AudioWorklets, is discussed. Results obtained with multiple configurations, browsers, and operating systems show that the proposed approach outperforms the WebRTC RTCPeerConnection standard in terms of audio quality and latency, which in the authors' best case to date has been reduced to only 40 ms between two MacBooks on a local area network. © 2022 Audio Engineering Society. All rights reserved.

5.
4th Celtic Language Technology Workshop, CLTW 2022 ; : 104-109, 2022.
Article in English | Scopus | ID: covidwho-2169580

ABSTRACT

This paper presents the design, collection and verification of a bilingual text-to-speech synthesis corpus for Welsh and English. The ever expanding voice collection currently contains almost 10 hours of recordings from a bilingual, phonetically balanced text corpus. The speakers consist of a professional voice actor and three amateur contributors, with male and female accents from north and south Wales. This corpus provides audio-text pairs for building and training high-quality bilingual Welsh-English neural based TTS systems. We describe the process by which we created a phonetically balanced prompt set and the challenges of attempting to collate such a dataset during the COVID-19 pandemic. Our initial findings in validating the corpus via the implementation of a state-of-the-art TTS models are presented. This corpus represents the first open-source Welsh language corpus large enough to capitalise on neural TTS architectures. © European Language Resources Association (ELRA)

6.
AES Europe Spring 2022 - 152nd Audio Engineering Society Convention 2022 ; : 176-184, 2022.
Article in English | Scopus | ID: covidwho-2011328

ABSTRACT

The human need to communicate and connect during the Covid-19 pandemic has led to the increasing use of teleconferencing applications. Users naturally pay attention to audio quality in choosing a teleconference application from many available and easily accessible applications. Audio quality is partly affected by the audio coding method used and developed on the application and the noise introduced within the network. This work evaluates the audio quality of 5 popular teleconferencing applications using the subjective test. For complement, objective tests and Signal-to-Noise Ratio (SNR) assessments are also carried out. The standard used for the subjective test is ITU-R BS.1116-3: Methods for the Subjective Assessment of Small Impairments in Audio Systems, which aims to identify small differences between audio quality. The assessment is conducted by comparing the original and compressed audio. The original audio is recorded from the speaker side, while the compressed audio is recorded from the receiver side. Both are assessed using the subjective test method by 20 subjects. The assessment results of each audio teleconferencing application are different, even though some applications use the same codec. We also found that one of the most popular applications tends to have the lowest average score among the tested audio applications. © (2022) by the Audio Engineering Society All rights reserved.

7.
9th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2022 ; 13258 LNCS:114-124, 2022.
Article in English | Scopus | ID: covidwho-1899007

ABSTRACT

Estimating the capacity of a room or venue is essential to avoid overcrowding that could compromise people’s safety. Having enough free space to guarantee a minimal safety distance between people is also essential for health reasons, as in the current COVID-19 pandemic. Already existing systems for automatic crowd counting are mostly based on image or video data, and some of them, using deep learning architectures. In this paper, we study the viability of already existing Deep Learning Crowd Counting systems and propose new alternatives based on new network architectures containing convolutional layers, exclusively based on the use of environmental audio signals. The proposed architecture is able to infer the actual capacity with a higher accuracy in comparison to previous proposals. Consequently, conclusions from the accuracy obtained with out approach are drawn and the possible scope of deep learning based crowd counting systems is discussed. © 2022, Springer Nature Switzerland AG.

8.
IEEE Access ; 8: 154087-154094, 2020.
Article in English | MEDLINE | ID: covidwho-1522519

ABSTRACT

The current pandemic associated with the novel coronavirus (COVID-19) presents a new area of research with its own set of challenges. Creating unobtrusive remote monitoring tools for medical professionals that may aid in diagnosis, monitoring and contact tracing could lead to more efficient and accurate treatments, especially in this time of physical distancing. Audio based sensing methods can address this by measuring the frequency, severity and characteristics of the COVID-19 cough. However, the feasibility of accumulating coughs directly from patients is low in the short term. This article introduces a novel database (NoCoCoDa), which contains COVID-19 cough events obtained through public media interviews with COVID-19 patients, as an interim solution. After manual segmentation of the interviews, a total of 73 individual cough events were extracted and cough phase annotation was performed. Furthermore, the COVID-19 cough is typically dry but can present as a more productive cough in severe cases. Therefore, an investigation of cough sub-type (productive vs. dry) of the NoCoCoDa was performed using methods previously published by our research group. Most of the NoCoCoDa cough events were recorded either during or after a severe period of the disease, which is supported by the fact that 77% of the COVID-19 coughs were classified as productive based on our previous work. The NoCoCoDa is designed to be used for rapid exploration and algorithm development, which can then be applied to more extensive datasets and potentially real time applications. The NoCoCoDa is available for free to the research community upon request.

9.
IEEE Trans Serv Comput ; 15(3): 1220-1232, 2022 May.
Article in English | MEDLINE | ID: covidwho-1132803

ABSTRACT

In an attempt to reduce the infection rate of the COrona VIrus Disease-19 (Covid-19) countries around the world have echoed the exigency for an economical, accessible, point-of-need diagnostic test to identify Covid-19 carriers so that they (individuals who test positive) can be advised to self isolate rather than the entire community. Availability of a quick turn-around time diagnostic test would essentially mean that life, in general, can return to normality-at-large. In this regards, studies concurrent in time with ours have investigated different respiratory sounds, including cough, to recognise potential Covid-19 carriers. However, these studies lack clinical control and rely on Internet users confirming their test results in a web questionnaire (crowdsourcing) thus rendering their analysis inadequate. We seek to evaluate the detection performance of a primary screening tool of Covid-19 solely based on the cough sound from 8,380 clinically validated samples with laboratory molecular-test (2,339 Covid-19 positive and 6,041 Covid-19 negative) under quantitative RT-PCR (qRT-PCR) from certified laboratories. All collected samples were clinically labelled, i.e., Covid-19 positive or negative, according to the results in addition to the disease severity based on the qRT-PCR threshold cycle (Ct) and lymphocytes count from the patients. Our proposed generic method is an algorithm based on Empirical Mode Decomposition (EMD) for cough sound detection with subsequent classification based on a tensor of audio sonographs and deep artificial neural network classifier with convolutional layers called 'DeepCough'. Two different versions of DeepCough based on the number of tensor dimensions, i.e., DeepCough2D and DeepCough3D, have been investigated. These methods have been deployed in a multi-platform prototype web-app 'CoughDetect'. Covid-19 recognition results rates achieved a promising AUC (Area Under Curve) of [Formula: see text] 98 . 80 % ± 0 . 83 % , sensitivity of [Formula: see text] 96 . 43 % ± 1 . 85 % , and specificity of [Formula: see text] 96 . 20 % ± 1 . 74 % and average AUC of [Formula: see text] 81 . 08 % ± 5 . 05 % for the recognition of three severity levels. Our proposed web tool as a point-of-need primary diagnostic test for Covid-19 facilitates the rapid detection of the infection. We believe it has the potential to significantly hamper the Covid-19 pandemic across the world.

SELECTION OF CITATIONS
SEARCH DETAIL